2 research outputs found
A Complete Expressiveness Hierarchy for Subgraph GNNs via Subgraph Weisfeiler-Lehman Tests
Recently, subgraph GNNs have emerged as an important direction for developing
expressive graph neural networks (GNNs). While numerous architectures have been
proposed, so far there is still a limited understanding of how various design
paradigms differ in terms of expressive power, nor is it clear what design
principle achieves maximal expressiveness with minimal architectural
complexity. Targeting these fundamental questions, this paper conducts a
systematic study of general node-based subgraph GNNs through the lens of
Subgraph Weisfeiler-Lehman Tests (SWL). Our central result is to build a
complete hierarchy of SWL with strictly growing expressivity. Concretely, we
prove that any node-based subgraph GNN falls into one of the six SWL
equivalence classes, among which achieves the maximal
expressive power. We also study how these equivalence classes differ in terms
of their practical expressiveness such as encoding graph distance and
biconnectivity. In addition, we give a tight expressivity upper bound of all
SWL algorithms by establishing a close relation with localized versions of
Folklore WL tests (FWL). Overall, our results provide insights into the power
of existing subgraph GNNs, guide the design of new architectures, and point out
their limitations by revealing an inherent gap with the 2-FWL test. Finally,
experiments on the ZINC benchmark demonstrate that -inspired
subgraph GNNs can significantly outperform prior architectures despite great
simplicity.Comment: 74 pages, 13 figure
Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective
Recent studies have discovered that Chain-of-Thought prompting (CoT) can
dramatically improve the performance of Large Language Models (LLMs),
particularly when dealing with complex tasks involving mathematics or
reasoning. Despite the enormous empirical success, the underlying mechanisms
behind CoT and how it unlocks the potential of LLMs remain elusive. In this
paper, we take a first step towards theoretically answering these questions.
Specifically, we examine the capacity of LLMs with CoT in solving fundamental
mathematical and decision-making problems. We start by giving an impossibility
result showing that any bounded-depth Transformer cannot directly output
correct answers for basic arithmetic/equation tasks unless the model size grows
super-polynomially with respect to the input length. In contrast, we then prove
by construction that autoregressive Transformers of a constant size suffice to
solve both tasks by generating CoT derivations using a commonly-used math
language format. Moreover, we show LLMs with CoT are capable of solving a
general class of decision-making problems known as Dynamic Programming, thus
justifying its power in tackling complex real-world tasks. Finally, extensive
experiments on four tasks show that, while Transformers always fail to predict
the answers directly, they can consistently learn to generate correct solutions
step-by-step given sufficient CoT demonstrations.Comment: 33 page